Comparing the Accuracy of Large Language Models (LLM) in Trending Obstetrical Topics

Amber Khemlani; Joshua Singavarapu; Ranjitha Vasa; Huber Rodriguez-Tejada; Harsh Reshamwala; Ozgul Muneyyirci-Delale; Mudar Dalloul

doi:https://doi.org/10.22259/2638-5244.0701003

Abstract

Generative artificial intelligence (AI) is rapidly expanding in medicine, where both patients and healthcare providers are increasingly relying on large language model (LLM) chatbots for information. In this study, we evaluated four AI chatbots—ChatGPT 4.0, Gemini 3.7, Copilot AI, and Perplexity AI —by analyzing their responses to queries related to three obstetrical pathologies: preeclampsia, placental abruption, and gestational diabetes mellitus. Queries for the top five obstetrical pathologies were obtained from U.S. Google Trends data spanning December 10, 2019, to December 10, 2024. AI-generated responses were assessed using validated evaluation tools: the Patient Education Material Assessment Tool (PEMAT) for understandability and actionability, DISCERN for information quality, and the Flesch-Kincaid formula for readability. AI-generated content was reviewed for alignment with guidelines from the American College of Obstetricians and Gynecologists (ACOG). PEMAT scores for understandability and actionability were analyzed using chi-square tests, while DISCERN and Flesch-Kincaid scores were evaluated using the Kruskal-Wallis test. ChatGPT showed promising results through PEMAT actionability, PEMAT understandability, and DISCERN scores. The Flesch-Kincaid readability scores of all the chatbots were similar, as they all were written at a high school grade level. This indicates a need for AI chatbots to formulate responses that cater to varying grade levels of knowledge. Furthermore, there is a future where AI becomes the primary source of information, and it is important to continually challenge and evaluate LLMs for potential misinformation and accurate data.

Keywords: Preeclampsia, Placental Abruption, Gestational Diabetes Mellitus, Artificial Intelligence, Obstetrics.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Statistics

520 Views

793 Downloads

Volume & Issue

Volume 7, Issue 1

Article Type

Research Article

How to Cite

Select Citation Format:

Citation:

Amber Khemlani, Joshua Singavarapu, Ranjitha Vasa, Huber Rodriguez-Tejada, Harsh Reshamwala, Ozgul Muneyyirci-Delale, Mudar Dalloul. (2025-05-09). "Comparing the Accuracy of Large Language Models (LLM) in Trending Obstetrical Topics." *Volume 7*, 1, 18-22

Comparing the Accuracy of Large Language Models (LLM) in Trending Obstetrical Topics

Author Details

Journal Details

Published

Downloads

Abstract

Statistics

Volume & Issue

Article Type

How to Cite

Useful Links